Bernoulli 15(3), 2009, 774-798 
DOI: 10.3150/08-BEJ176 



Optimal scaling of the random walk 
Metropolis on elliptically symmetric unimodal 
targets 

CHRIS SHERLOCK 1 and GARETH ROBERTS 2 

1 Department of Mathematics and Statistics, Lancaster University, Lancaster, LAI J^YF, UK. 
E-mail: c.sherlock@lancaster.ac.uk 

2 Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK. 
E-mail: gareth.o.roberts@warwick.ac.uk 

Scaling of proposals for Metropolis algorithms is an important practical problem in MCMC 
implementation. Criteria for scaling based on empirical acceptance rates of algorithms have 
been found to work consistently well across a broad range of problems. Essentially, proposal 
jump sizes are increased when acceptance rates are high and decreased when rates are low. 
In recent years, considerable theoretical support has been given for rules of this type which 
work on the basis that acceptance rates around 0.234 should be preferred. This has been based 
on asymptotic results that approximate high dimensional algorithm trajectories by diffusions. 
In this paper, we develop a novel approach to understanding 0.234 which avoids the need for 
diffusion limits. We derive explicit formulae for algorithm efficiency and acceptance rates as 
functions of the scaling parameter. We apply these to the family of elliptically symmetric target 
densities, where further illuminating explicit results are possible. Under suitable conditions, we 
verify the 0.234 rule for a new class of target densities. Moreover, we can characterise cases 
where 0.234 fails to hold, either because the target density is too diffuse in a sense we make 
precise, or because the eccentricity of the target density is too severe, again in a sense we make 
precise. We provide numerical verifications of our results. 

Keywords: optimal acceptance rate; optimal scaling; random walk Metropolis 

1. Introduction 

The Metropolis-Hastings updating scheme provides a very general class of algorithms for 
obtaining a dependent sample from a target distribution, tt(-). Given the current value 
X, a new value X* is proposed from a pre-specified Lebesgue density g(x*|x) and is 
then accepted with probability a(x,x*) = min(l, (7r(x*)g(x|x*))/(7r(x)g(x*|x))). If the 
proposed value is accepted it becomes the next current value (X' <— X* ) , otherwise the 
current value is left unchanged (X' <— X) . 
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Consider the d-dimensional random walk Metropolis (RWM) [7]: 

where y* := x* — x is the proposed jump, and r(y) = r(—y) for all y. In this case the 
acceptance probability simplifies to 

a(x,x*)=min(l,^). (2) 

Now consider the behaviour of the RWM as a function of the scale of proposed jumps, 
A, and some measure of the scale of variability of the target distribution, rj. If A <C rj then, 
although proposed jumps arc often accepted, the chain moves slowly and exploration of 
the target distribution is relatively inefficient. If A 3> rj then many proposed jumps are 
not accepted, the chain rarely moves and exploration is again inefficient. This suggests 
that given a particular target and form for the jump proposal distribution, there may 
exist a finite scale parameter for the proposal with which the algorithm will explore the 
target as efficiently as possible. We are concerned with the definition and existence of an 
optimal scaling, its asymptotic properties and the process of finding it. We start with a 
brief review of current literature on the topic. 



1.1. Existing results for optimal scaling of the RWM 

Existing literature on this problem has concentrated on obtaining a limiting diffusion 
process from a sequence of Metropolis algorithms with increasing dimension. The speed 
of this limiting diffusion is then maximised with respect to a transformation of the scale 
parameter to find the optimally scaled algorithm. Roberts et al. [9] first follow this 
program for densities of the form 

d 

tt(x)=JJ/(x < ) (3) 

i=l 

using Gaussian jump proposals, Y' d ' ~ N(0, a^Id)- Here and throughout this article 1^ 
denotes the d-dimcnsional identity matrix. For high dimensional targets which satisfy cer- 
tain moment conditions it is shown that the optimal value of the scale parameter satisfies 
d}/ 2 \d = I, for some fixed I which is dependent on the roughness of the target. Particu- 
larly appealing, however, from a practical perspective, is the following distribution-free 
interpretation of the optimal scaling for the class of distributions given by (3). It is the 
scaling that leads to the proportion 0.234 of proposed moves being accepted. 

Empirically this "0.234" rule has been observed to be approximately right much more 
generally Extensions and generalisations of this result can be found in Roberts and 
Rosenthal [10], which also provides an accessible review of the area, and Bedard [2], 
Breyer and Roberts [3], Roberts [8]. The focus of much of this work is in trying to 
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characterise when the "0.234" rule holds and to explain how and why it breaks down in 
other situations. 

One major disadvantage of the diffusion limit work is its reliance on asymptotics in 
the dimensionality of the problem. Although it is often empirically observed that the 
limiting behaviour can be seen in rather small dimensional problems (see, e.g., Gelman 
et al. [5]) it is difficult to quantify this in any general way. 

In this paper we adopt a finite dimensional approach, deriving and working with ex- 
plicit solutions for algorithm efficiency and overall acceptance rates. 

1.2. Efficiency and expected acceptance rate 

In order to consider the problem of optimising the algorithm, an optimisation criterion 
needs to be chosen. Unfortunately this is far from unique. In practical MCMC, interest 
may lie in the estimation of a collection of expected functionals. For any one of these 
functionals, / say, a plausible criterion to minimise is the stationary integrated autocor- 
relation time for / given by 



Under appropriate conditions, the MCMC central limit theorem for {f(Xi}} gives a 
Monte Carlo variance proportional to r/. This approach has two major disadvantages. 
First, estimation of Tf is notoriously difficult, and second, this optimisation criterion 
gives a different solution for the "optimal" chain for different functionals /. 

In the diffusion limit, the problem of non-uniqueness of the optimal chain is avoided 
since in all cases Tf is proportional to the inverse of the diffusion speed. This suggests 
that plausible criteria might be based on optimising properties of single increments of 
the chain. 

The most general target distributions that we shall examine here possess elliptical 
symmetry. If a d-dimensional target distribution has elliptical contours then there is a 
simple invertiblc linear transformation T : 5R d — > which produces a spherically sym- 
metric target. To fix it (up to an arbitrary rotation) we define T to be the transformation 
that produces a spherically symmetric target with unit scale parameter. Here the exact 
meaning of "unit scale parameter" may be decided arbitrarily or by convention. The scale 
parameter (3i along the ith principal axis of the ellipse is the ith eigenvalue of T . 

Let X and X' be consecutive elements of a stationary chain exploring a d-dimensional 
target distribution. A natural efficiency measure for elliptical targets is Mahalanobis 
distance, for example, Krzanowski [6]: 



where X- and Xi are the components of X' and X along the ith principal axis and Yi 
are components of the realised jump Y = X' — X. We refer to this as the expected square 



r / = l + 2^Cor(/(X ),/(X i )). 
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jump distance, or ESJD. We will relate ESJD to expected acceptance rate (EAR) which 
we define as ay := £'[a(X, X*)], where the expectation is with respect to the joint law 
for the current value X and the proposed value X*. Note that we are not interested in 
the value of the ESJD itself but only in the scaling and EAR at which the maximum 
ESJD is attained. 

1.3. Outline of this paper 

The body of this paper investigates the RWM algorithm on spherically and then ellipti- 
cally symmetric unimodal targets. Section 2 considers finite dimensional algorithms on 
spherically symmetric unimodal targets and derives explicit formulae for ESJD and EAR 
in terms of the scale parameter associated with the proposed jumps (Theorem 1). Several 
example algorithms are then introduced and the forms of ay (A) and Sj(A) are derived for 
specific values of d either analytically or by numerical integration. Numerical results for 
the relationship between the optimal acceptance rate and dimension are then described; 
in most of these examples the limiting optimal acceptance rate appears to be less than 
0.234. 

The explicit formulae in Theorem 1 involve the target's marginal one-dimensional 
distribution function. Theorem 2 of Section 3 provides a limiting form for the marginal 
one-dimensional distribution function of a spherically symmetric random variable as d — ► 
oo and Theorem 3 combines this with a result from measure theory to provide limiting 
forms for EAR and ESJD as d — > oo. A natural next step would be to use the limiting 
ESJD to estimate a limiting optimal scale parameter rather than directly examining 
the limit of the optimal scale parameters of the finite dimensional ESJDs. It is shown 
that this process is sometimes invalid when the target contains a mixture of scales that 
produce local maxima in ESJD and whose ratio increases without bound. Exact criteria 
are provided in Lemma 2 and are related to the numerical examples. 

Many "standard" sequences of distributions satisfy the condition that as d — ► oo the 
probability mass becomes concentrated in a spherical shell which itself becomes infinites- 
imally thin relative to its radius. Thus the random walk on a rescaling of the target is, 
in the limit, effectively confined to the surface of this shell. Theorem 4 considers RWM 
algorithms on sequences of spherically symmetric unimodal targets where the sequence 
of proposal distributions satisfies this "shell condition" . It is shown that if the target 
sequence also satisfies the "shell condition" then the limiting optimal EAR is 0.234; how- 
ever, if the target mass does not converge to an infinitesimally thin shell then the limiting 
optimal EAR (if it exists) is strictly less than 0.234. Rescalings of both the target and 
proposal are usually required in order to stabilise the radius of the shell, whether or not 
it becomes infinitesimally thin. These influence the form of the optimal scale parameter 
so that in general it is not proportional to d^ 1 / 2 . Corollary 4 provides an explicit formula 
that is consistent with the numerical examples. 

Section 4 extends the results for finite dimensional random walks to all clliptically 
symmetric targets. Limit results are extended through Theorem 5 to sequences of ellip- 
tically symmetric targets for which the ellipses do not become too eccentric. The article 
concludes in Section 5 with a discussion. 
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2. Exact results for finite dimension 

In this section we derive Theorem 1 , which provides exact formulae for ES JD and EAR for 
a random walk Metropolis algorithm acting on a unimodal spherically symmetric target. 
The formulae in Theorem 1 refer to the target's marginal one-dimensional distribution 
function; these are then converted to use the more intuitive marginal radial distribution 
function. Several example targets are introduced and results from exact calculations of 
ESJD and EAR are presented. 

We adopt the notation outlined in Section 1.2. All distributions (target and proposal) 
are assumed to have densities with respect to Lebesgue measure, and we consider the 
chain to be stationary so that the marginal densities of both X and X' are 7r(-). We also 
assume that the space of possible values for element x of a d-dimensional chain is ?R d . 

We consider only target densities with a single mode; however, the density need not 
decrease with strict monotonicity and may have a series of plateaux. We refer to random 
variables with such densities as unimodal. In this section and the section that follows we 
further restrict our choice of target to include only random variables where the density 
has spherical contour lines. Such random variables are termed isotropic or spherically 
symmetric. ESJD is as defined in (4) where the expectation is taken with respect to the 
joint law for the current position and the realised jump. For a spherical target [3i = fJii, 
the ESJD is proportional to the expected squared Euclidean distance, and both are 
maximised by the same scaling A. Since the constant of proportionality, derives from 
an arbitrary definition of "unit scale parameter" , we simply set it to 1 for spherically 
symmetric random variables. 

Denote the one-dimensional marginal distribution function of a general d-dimensional 
target X( d ) along unit vector y as F^ d (x). When X( d ) is spherically symmetric, this is 
independent of y, and we simply refer to it as the one-dimensional marginal distribution 
function of X^ d '. The following is proved in Appendix A.l. 

Theorem 1. Consider a stationary random walk Metropolis algorithm on a spherically 
symmetric unimodal target which has marginal one- dimensional distribution function 
Fi\d(x). Let jumps be proposed from a symmetric density as defined in (1). In this case 
the expected acceptance rate and the expected square jump distance are 

S d (A) = 2E[F 1 | (2 (-iA|Y|)] and (5) 
5 d 2 (A) = 2A 2 E[|Y| 2 F 1 | (i (-lA|Y|)], (6) 

where the expectation is taken with respect to measure r(-). 

The marginal distribution function F 1 \ d (—\\Y\/2) is bounded and decreasing in A. 
Also linxr^oo Fi\<i{— x) = and by symmetry, provided -Fx|d(*) is continuous at the origin, 
linxr^o Fi\d(~ x ) = 0.5. Applying the bounded convergence theorem to (5) we therefore 
obtain the following intuitive result: 
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Corollary 1. Let X be the scaling parameter for any RWM algorithm on a unimodal 
isotropic target Lebesgue density. In this situation the EAR at stationarity cEd(A) de- 
creases with increasing X, with lim,\_,o aid (A) = 1 and lim,\— »oo ay (A) = 0. 

In our search for an optimal scaling there is an implicit assumption that such a scaling 
exists. This was justified intuitively in Section 1 but the existence of an optimal scaling 
has previously only been proven for the limiting diffusion process as d — > co; see Roberts 
et al. [9] . Starting from Theorem 1 the following is relatively straightforward to prove (see 
Sherlock [11]) and starts to justify a search for an optimal scaling for a finite dimensional 
random walk algorithm rather than a limit process. 

Corollary 2. Consider a spherically symmetric unimodal d-dimensional target Lebesgue 
density 7r(x). Let tt(-) be explored via an RWM algorithm with proposal Lebesgue density 
j^r(y/X). If EjJIXI 2 ] < oo and E r [|Y| 2 ] < oo then the ESJD of the Markov chain at 
stationarity attains its maximum at a finite non-zero value (or values) of X. 

For the remainder of this section we examine the behaviour of real, finite dimensional 
examples of random walk algorithms. As well as being of interest in its own right, this 
will motivate Section 3 where Theorem 1 will provide the basis from which properties 
of EAR and ESJD are obtained as dimension d^oo. To render Theorem 1 of more use 
for practical calculation, we first convert it to involve the more intuitive marginal radial 
distribution rather than the marginal one-dimensional distribution function. 

We introduce some further notation; write Fd(-) and fd(') for the marginal radial 
distribution and density functions of <i-dimensional spherically symmetric target X.^ ; 
these are the distribution and density functions of |XW|. The density of |Y| (when A = 1) 
is denoted Fd(-). 

We start with a form for the one-dimensional marginal distribution function of a spher- 
ically symmetric random variable in terms of its marginal radial distribution function. 
Derivation of this result from first principles is straightforward; see Sherlock [11]. 

Lemma 1. For any d-dimensional spherically symmetric random variable with contin- 
uous marginal radial distribution function Fd(r) with Fd(0) = 0, the one- dimensional 
marginal distribution function along any axis is 



where sign(x) = 1 for x > and sign(ir) = —1 for x < 0, and Gd(-) is the distribution 
function of Ud, with U\ = 1 and 




(7) 




For the RWM we are concerned only with targets with Lebesgue densities. In this case 
both the marginal one-dimensional and radial distribution functions are continuous, and 
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Fd(0) = as there can be no point mass at the origin (or anywhere else). Substituting 
(7) into (5) and (6) gives 



a d (A)=E 



Y,X< d > 



AjY] 

2|X( d )| 



and Sj(X) 



A 2 E- 



Y,XW 



\Y\ 2 K„ 



A|Y] 

2|X(«0| 



where Y is a random variable with density r(-) and K d (x) := 1 — Gd(x 2 ). The expectations 
depend on X and Y only through their moduli, thus allowing expressions for EAR and 
ESJD in terms of simple double integrals involving the marginal radial densities of |X| 
and |Y|. For unimodal spherically symmetric targets we therefore obtain: 



a d (X) = 
Sj(A) = A 2 



dy 



dxr d {y)f d (x)K d 



Xy/2 



dy 



Xy/2 



dxr d (y)f d (x)y 2 K d 



(8) 
(9) 



Since X.^ is spherically symmetric /d(|x|) = a d \x.\ d 1 7r(x), where a d := 2n d ^ 2 /T(d/2); 
see [1], Chapter 15. In the examples below we also consider only spherically symmetric 
proposals so that r d (|y|) = a ( j|y| <^ ~ 1 r' ( i(y)■ 



2.1. Explicit and computational results 

Using (8) and (9) we first examine the dependency of EAR and ESJD on A for any given 
dimension. We then examine the behaviour of the optimal scaling and optimal acceptance 
rate as dimension d increases. 

Now A'i(u) = 1 for u < 1 and Ki(u) = otherwise, and so for one-dimensional RWM 
algorithms the integrals in (8) and (9) may sometimes be evaluated exactly. For example, 
with a Gaussian target and Gaussian proposal, (8) and (9) give 

2 _i/2\ 
ai(A) = — tan — and 
7T \\J 

Maximising (10) numerically gives an optimal scaling of A ~ 2.43 which corresponds to 
an optimal EAR of 0.439. 

With both target and proposal following a double exponential distribution, (8) and (9) 
produce 

2 16A 2 

Si(a) = aT^ and sfw = TxTW 

Sf and cici are thus related by the simple analytical expression Sf = 8ai(l — c?i) 2 , and 
the ESJD attains a maximum at an EAR of 1/3, for which A = 4. 
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We now consider two example targets with d = 10: first, a simple Gaussian (^(x) oc 
C H X I / 2 ) 5 anc | second, a mixture of Gaussians: 

7r d (x)oc(l-p d ) C -| X | 2 /2 +? , d i. c -|x| 2 /(2<i 2 ) (d > 2)) (n) 

with pd~l/d 2 . Both targets are explored using spherically symmetric Gaussian propos- 
als; results are shown in Figure 1. As with the previous two examples, increasing A from 
to oo decreases the EAR from 1 to 0, as deduced in Corollary 1. Further, in all four 
examples, as noted in Corollary 2, ESJD achieves a global maximum at finite, strictly 
positive values of A. In the first three examples ESJD as a function of the scaling shows a 
single maximum; however, in the mixture example similar high ESJDs are achieved with 
two very different scale parameters (approximately 0.8 and 7.6). The acceptance rates 
at these maxima are 0.26 and 0.0026, respectively. The values A = 0.8 and a = 0.26 are 
almost identical to the optimal values for exploring a standard ten-dimensional Gaussian 
and so are ideal for exploring the first component of the mixture. Optimal exploration 
of the second component is clearly to be achieved by increasing the scale parameter by 
a factor of 10; however, the second component has a mixture weight of 0.01 and so the 
acceptance rate for such proposals is reduced accordingly. The mixture weighting of the 
second component, 1/d 2 , is just sufficient to balance the increase in optimal jump size 
for that component, with the result that the two peaks in ESJD are of equal heights. 

We next examine the behaviour of the optimal scaling and the corresponding EAR as 
d increases. Calculations arc performed for eight different targets: 

1. Gaussian density: ^(x) occ~l x l / 2 ; 

2. exponential density: ^(x) oce~' x '; 

3. target with a Gaussian marginal radial density: 7Td(x) oc |x|~ d+1 c~l x l / 2 ; 

4. target with an exponential marginal radial density: 7Td(x) oc |x|~ d+1 e~l x l ; 

5. lognormal density altered so as to be unimodal: 

^(x)cxl {|x| < c - (d - 1)} +C^ 1 °sl X l + ( d - 1 )) 2 /2 1{|x|>o _ (£i _ 1)}; 

6. the mixture of Gaussians given by (11) with pd = 0.2; 

7. the mixture of Gaussians given by (11) with pd = 1/d; 

8. the mixture of Gaussians given by (11) with p d = 1/d 3 . 

Proposals are generated from a Gaussian density. For each combination of target and 
proposal simple numerical routines are employed to find the scaling A that produces the 
largest ESJD. Substitution into (8) gives the corresponding optimal EAR a. 

Figure 2 shows plots of optimal EAR against dimension for example targets 1-4. The 
first of these is entirely consistent with Figure 4 in Roberts and Rosenthal [10], which 
shows optimal acceptance rates obtained through repeated runs of the RWM algorithm. 
The first two are consistent with a conjecture that the optimal EAR approaches 0.234 
as d — > oo; however, for examples targets 3 and 4, the optimal EAR appears to approach 
limits of approximately 0.10 and 0.06, respectively. 



782 



C. Sherlock and G. Roberts 



Gaussian Gaussian mixture 
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Gaussian Gaussian mixture 




0.0 0.5 1.0 1.5 2.0 2.5 3.0 5 10 15 

A A 

Gaussian Gaussian mixture 




0.0 0.2 04 0.6 0.8 0.0 0.2 0.4 _ 0.6 0.8 

aio aid 



Figure 1. Plots for a Gaussian target (left) and the Gaussian mixture target of (11) with 
Pd = 1/d 2 (right), both at d = 10 and with a Gaussian jump proposal. Panels from top to 
bottom are (i) ESJD against scaling, (ii) EAR against scaling and (iii) ESJD against EAR. 



For target 5 with d = 1,2 or 3, plots of ESJD against scale parameter, EAR against 
scale parameter and ESJD against EAR (not shown) are heuristically similar to those for 
the standard Gaussian target in Figure 1. However, for d=l,2 and 3 the optimal EARs 
are approximately 0.111, 0.010 and 0.00057, respectively, and appear to be approaching 
a limiting optimal acceptance rate of 0. 

Figure 3 shows plots of EAR against dimension for the three mixture targets (6-8). 
Here the asymptotically optimal EAR appears to be approximately 0.234/5, and 0.234, 
respectively. The limiting behaviour of each of these examples is explained in the next 
section. 



3. Limit results for spherically symmetric 
distributions 

Theorem 1 provides exact analytical forms for the EAR and ESJD of an RWM algo- 
rithm on a unimodal spherically symmetric target in terms of the target's marginal 
one-dimensional distribution function. In this section we investigate the behaviour of 
EAR and ESJD in the limit as dimension d —> oo. As groundwork for this investigation 
we must first examine the possible limiting forms of the marginal one-dimensional dis- 



Optimal scaling of the random walk Metropolis 



783 



Target 1 
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d 



Figure 2. Plots of the optimal EAR a against dimension for example targets 1-4 using a 
Gaussian jump proposal. The horizontal dotted line approximates the apparent asymptotically 
optimal acceptance rate of 0.234 in the first two plots and 0.10 and 0.06 in the third and fourth 
plots, respectively. 



tribution function of a spherically symmetric random variable. We adopt the following 
notation: Convergence in distribution is denoted by convergence in probability is 
denoted by — and convergence in mean square by 

Convergence of the sequence of characteristic functions of a sequence of d-dimcnsional 
isotropic random variables (indexed by d) to that of a mixture of normals is proved 
as Theorem 2.21 of Fang et al. [4]. Thus the limiting marginal distribution along any 
given axis may be written as X\ = RZ with Z a standard Gaussian and R the mixing 
distribution. Sherlock [11] proves from first principles the following extension. 

Theorem 2. Let ~XS d ' be a sequence of d-dimensional spherically symmetric random 
variables. If there is a k c i such that |X^^|/fcrf R then the sequence of marginal one- 
dimensional distributions o/X^ satisfies 



■< x i 



where <&(•) is the standard Gaussian distribution function. 
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Figure 3. Plots of the optimal EAR a against dimension for example targets 6-8 using a Gaus- 
sian jump proposal. The horizontal dotted line in each plot represents the apparent asymptoti- 
cally optimal acceptance rate of 0.234/5, and 0.234, respectively. 



|X( d )| possesses a Lebesgue density and therefore no point mass at the origin; however, 
the rescaled limit R may possess such a point mass. Provided R has no point mass at 0, 
the limiting marginal one-dimensional distribution function O(xi) as defined in Theorem 
2 is therefore continuous for all x S 5ft. This continuity implies that the limit in Theorem 
2 is approached uniformly in x\, and for this reason the lack of a radial point mass at 
is an essential requirement in Theorem 3. 

The condition of convergence of the rescaled modulus to f or to random variable R 
will turn out to be the key factor in determining the behaviour of the optimal EAR as 
ci — > oo; we now examine this limiting convergence behaviour in more detail. 

For many standard sequences of density functions there is a k d such that \^ d) \/k d 
1. This includes example targets 1 and 2 from Section 2.1, and more generally any density 
of the form 7Td(x) oc |x| a e~' x ' . An intuitive understanding of target sequences satisfying 
this condition is that, as d — ► oo the probability mass becomes concentrated in a spherical 
shell which itself becomes infinitesimally thin relative to its radius. The random walk on 
a rescaling of the target is, in the limit, effectively confined to the surface of this shell. 

Example targets 3 and 4 have marginal radial distributions which are always respec- 
tively a positive unit Gaussian and a unit exponential. The first term in the density 
of example target 5 simply ensures unimodality and becomes increasingly unimportant 
as d increases. Trivial algebraic rearrangement of the second component shows that its 
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marginal radial distribution has the same log-normal form whatever the dimension. Ex- 
ample targets 6-8 are examined in detail in Section 3.2. 

3.1. A limit theorem for EAR and ESJD 

We now return to the RWM and derive limiting forms for ESJD and EAR on unimodal 
spherically symmetric targets as d — > oo. Henceforth it is assumed that the radial dis- 
tribution of the target, rescaled by a suitable quantity kx, converges weakly to some 
continuous limiting distribution, that of a random variable R. From Theorem 2, the 
limiting marginal distribution function 0(-) is in general a scaled mixture of Gaussian 
distribution functions but in the special case that R is a point mass at 1 the scaled 
mixture of Gaussians clearly reduces to the standard Gaussian cumulative distribution 
function $(•); F^di-^xx) -> $(xi). 

Consider a sequence of jump proposal random variables {Y^} with unit scale pa- 
rameter. If there exist ky such that \Y^\/k y d ^ converges (in a sense to be defined) 
then simple limit results are possible. Implicit in the derivation of these limit results is a 
transformation of our target and proposal: X( d ) <— "KS d ^ /k x d ^ and Y^ <— Y^ /ky. We 
define a transformed scale parameter 

M,:=2-^A d . (12) 

A random walk on target density (k x ) d Kd(k x x) using proposal density (ky) d rd(ky'y) 
and scale parameter 2[i c i is therefore equivalent to a random walk on 7r^(x) using proposal 
r d(y) and a scale parameter I = d x l 2 \d, a quantity which is familiar from the diffusion- 
based approach to optimal scaling (see Section 1.1). The following theorem characterises 
the limiting behaviour for EAR and ESJD for fixed values, \i, of the transformed scale 
parameter; it is proved in Appendix A. 2. 

Theorem 3. Let {X^} be a sequence of d- dimensional unimodal spherically symmetric 
target random variables and let {YW} be the corresponding sequence of jump proposals. 

If there exist {k x } such that \K.^ \/k x d ^ R where R has no point mass at then for 
fixed [i: 

(i) If there exist {k (d) } such that \Y^\/k y d) ^Y then 



a d (fi) -» 2E 



R 



(ii) // in fact \Y {d ^\/k y d) Y with E[Y 2 } < oo then 

piY 



Ak^ 



y 2 $l -'- 



R 



(13) 



(14) 
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The remainder of this paper focusses on an important corollary to Theorem 3, which 
is obtained by setting Y = 1 . 

Corollary 3. Let {X (d >}, {Y^}, {k ( x d) } and {k^} be as defined in Theorem 3 and let 
R be any non-negative random variable with no point mass at 0. 



(i) // |X( d )|/fci d) R and \Y^\/4 d) ^ 1 



a d {n) -» 2E 



4fci' 



(d) 



r ^)^2,i 2 E 



(ii) // |X( d )|/fci d) 1 and |Y( d )|/4 d) ^ 1 

Z?i(/i) -► 2$(-/x), 



('') 



(15) 
(16) 

(17) 
(18) 



With these asymptotic forms for EAR and ESJD we are finally equipped to examine 
the issue of optimal scaling in the limit as d — > oo. 



3.2. The validity and existence of an asymptotically optimal 
scaling 

It was shown in Section 2 that there is at least one finite optimal scaling for any spherically 
symmetric unimodal finite dimensional target with finite second moment provided the 
second moment of the proposal is also finite. We now investigate the validity and existence 
of a finite asymptotically optimal (transformed) scaling for spherically symmetric targets 
as d — > oo. 

1. Validity: We shall obtain an asymptotically optimal scaling by maximising the 
limiting efficiency function. Ideally we would instead find the limit of the sequence of 
scalings which maximise each finite dimensional efficiency function. We investigate 
the circumstances under which these are equivalent. 

2. Existence: It is not always the case that the limiting efficiency function possesses 
a finite maximum; examples are provided. 

An even stronger validity assumption is implicit in works such as Roberts et al. [9], 
Roberts and Rosenthal [10] and Bedard [2] . In each of these papers a limiting process is 
found and the efficiency of this limiting process is maximised to give an asymptotically 
optimal scaling. 

For a given sequence of targets and proposals with optimal scalings Ad, we seek the 
limiting transformed optimal scaling (i := lim^oo fa, where fa is given in terms of 
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by (12). The optimal scaling as d — > oo would therefore be X d ~ (2k ( x d) fi)/ (d 1 ' 2 ^). 
However the value fi will be obtained by maximising 2fi 2 Q{—fi) oc lim ( z- + oo (/•*)> where 
is defined as in Theorem 2. The following result indicates when the scaling that optimises 
the limit is equivalent to the limit of the optimal scalings. A proof is provided in Appendix 



Lemma 2. Let {S^(fi)} be a sequence of functions defined on [0,oo) with continuous 
pointwise limit S 2 (fi). Define 



For each d € N select any fid €E Md- 

(i) If M = {/t} and fid < a < oo \fd then fid~* A- 

(ii) If M = {fi} 3 a sequence fi* d — ► A where each fi* d is a local maximum of S^(-). 

(iii) If M = f> (S 2 has no finite maximum) then fid - > oo, that is, lim £ ;_ >C)0 (minM [ i) = 



(iv) If M ^ cf> and fid < a < oo Vd then S 2 (fid) ~* S 2 (fi) for any jl € M . 

We now highlight certain aspects of Lemma 2 through reference to the mixture target 
(11), and specifically to target examples 6-8 from Section 2.1. Later in this section 
Lemma 2 is also applied to target examples 1-5. In all that follows consider the sequence 
of graphs of S d (fid) against (id, where fid is given by (12); consider also the graph of 
the pointwise limit, S 2 (fi), against fi. For all targets of the form (11) with sufficiently 
large d (so that the components are sufficiently separated in scale) each graph of S^ifid) 
against \id has two peaks, and a different rescaling ki d ^ applies to each component of 
the mixture. Choosing to rescale by the higher ki would stabilise the right-hand peak 
while the left would approach fid = 0. However (unless pd — > 1) this choice of scaling 
would create a point mass at the origin in the limiting radial distribution function 0(-), 
which is forbidden in the statement of Theorem 3. In order to apply the theorem we 
must therefore rescale by the lower kx which stabilises the left-hand peak while the 
right-hand peak drifts off to fid = oo and is therefore not present in the pointwise limit. 
The existence and consistency of a limiting optimal scaling then depend on the relative 
heights of the peaks which in turn depend on the limiting behaviour of pd- 

First consider any target with pd > 1/d 2 such as target examples 6 and 7. For a given 
dimension this would produce plots similar to the right-hand panels of Figure 1 but with 
the right-hand peak higher than the left-hand peak and therefore providing the optimal 
scaling, fid- The limit of the scalings which maximise each finite dimensional ESJD is 
therefore not the same as the scaling which optimises the limiting ESJD. In Lemma 2 
Parts (i) and (iv) this situation is prevented through the condition fid<a<oo. 

Suppose in fact that pd — > p > 0, so that rescaling via the lower ki^ produces a point 
mass at oo in the limiting rescaled radial distribution. Consider first the optimal scaling 
obtained from the limiting form of the ESJD. By Theorem 2, 0(— xi) —>p/2 as x\ — ► oo. 
Hence the limiting ESJD given in Corollary 3 increases without bound as /j->oo; this 



A.3. 




oo. 



788 



C. Sherlock and G. Roberts 



is an example of case (iii) in Lemma 2. The optimal scaling for exploring a real d- 
dimensional target follows the portion of the target with the larger scale parameter, and 
so in the limit accepted jumps only arise from this portion of the target. The limit of the 
optimal EAR is therefore the limiting optimal EAR for the larger component multiplied 
by a factor p, as suggested by the results for target example 6. 

If pd — > p = then only the left-hand peak affects the forms in Corollary 3; the optimal 
scaling and acceptance rate calculated from this corollary are therefore identical to those 
for target example 1. However the true optimal scaling follows the right-hand peak and 
so the true limiting optimal acceptance rate is 0, as suggested by the results for target 
example 7. 

Alternatively if pd < 1/d 2 then for large enough d the stabilised left-hand peak domi- 
nates, fid is bounded and the limit of the maxima is the maximum of the limit function. 
The true limiting optimal acceptance rate is exactly that of the lower component as 
suggested for target example 8, and this is given correctly by Corollary 3. 

Provided pd — ► the limiting forms for EAR and ESJD are unaffected by the speed 
at which this limit is approached. The limiting forms are therefore uninformative about 
whether or not the second peak is important. This is a fundamental issue with the 
identifiability of a limiting optimal scaling from the limiting ESJD. 

The above clearly generalises from the specific form (11) so that failure of the bounded- 
ness condition on fid in Lemma 2 intuitively corresponds to a target sequence that contains 
a mixture of scales that produce local maxima in ESJD and whose ratio increases without 
bound. In general, targets that vary on at least two very different scales are not amenable 
to the current approach. Indeed the very existence of a single "optimal" scaling is highly 
debatable. We wish to work with the limit S 2 (/i), accepting its potential limitation. 
Therefore define fx := minM (or fi := oo if M = <f>), to be the asymptotically optimal 
transformed scaling (AOTS), and Xd = {2k^ fi) / {d 1 / 2 ^^) to be the asymptotically 
optimal scaling (AOS). These are equivalent to the limit of the optimal (transformed) 
scalings provided fid < a < oo,Vd. Similarly the asymptotically optimal expected 
acceptance rate (AO A) is the limiting EAR that results from using the AOTS. 

We now turn to the existence of an asymptotically optimal scaling. The practising 
statistician is free to choose the proposal distribution and we therefore assume throughout 
the remainder of our discussion of spherically symmetric targets that there is a sequence 
ky d ^ such that the transformed proposal satisfies |Y' d '|/fc^ ^> 1. 

First consider the special case where there is a sequence such that the transformed 
target satisfies \X.^\/k { x d) 1. Differentiating (18) we see that the optimal scaling must 
satisfy 2$( — ft p ) = fi p 4>{— fi P ) , which gives fi p :~ 1.19. Substituting into (17) provides the 
EAR at this optimal scaling: a p :« 0.234, as suggested by the finite dimensional results 

for target examples 1 and 2. More generally \X.^\/k^f > R. Following our discussions 
on validity we now assume that R contains no point mass at or oo. In general we seek 
a finite scaling ft that maximises the pointwise limit of S% as given in (16); we then 
compute the EAR using (15). We illustrate this process with reference to three of our 
non-standard examples from Section 2.1. 

For target examples 3 and 4 the marginal radial distribution is positive Gaussian 
(0d(r) oce~ r I 2 ) and exponential {0d(r) ~e~ r ), respectively. S 2 (fj.) is maximised at p,= 
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1.67 and p = 2.86, respectively, which correspond to EARs of 0.091 and 0.055, consistent 
with the findings in Section 2.1. In target examples 1-4, the ESJD of each element in 
the sequence has a single maximum, so by Lemma 2(ii) the limit of these maxima is the 
maximum of the limit function, subject to the scaling k^f 1 . 

For target example 5 the limiting transformed marginal radial density is 9(r) oc 
e -(iogr) anc j numer i ca i evaluation shows S 2 ([f) to be bounded above but to increase 
monotonically with /i; this corresponds to case (iii) of Lemma 2. As with target example 
6 this provides a situation where S 2 ([i) is increasing as /i — * oo. However unlike target 
example 6, here there is no radial point mass at oo, S 2 (fi) is bounded and the limiting 
optimal EAR is 0. 

3.3. Asymptotically optimal scaling and EAR 

If \Y^\/k y d} ^ 1 then for fi to be optimal we require 



There may not always be a solution for \i (see Section 3.2) but when there is, denote 
this value as p. Asymptotically optimal scaling is therefore achieved by setting fi — p, so 
that rearranging (12) we obtain the following corollary to Theorem 3: 

Corollary 4. Let {X^} be a sequence of d- dimensional spherically symmetric unimodal 
target distributions and let {Y^} be a sequence of jump proposal distributions. If there 
exist sequences {kx} and {ky } such that the marginal radial distribution function of 
X( d -* satisfies X^l/fc^ — > R where R has no point mass at 0, |YW|/4 d) ^l } ana 
provided there is a solution p to (19) then the asymptotically optimal scaling (AOS) 



Target examples 1 and 2 satisfy |X( d )|/fci — 1, with ki = d 1 / 2 and ki = d re- 
spectively; we therefore expect an AOTS of p p rs 1.19. For target examples 3 and 4, 
kx = 1, and the AOTSs are, respectively, p w 1.67 and p w 2.86. Figure 4 shows plot 
of the true optimal scale parameter against dimension evaluated numerically using (9). 
The asymptotic approximation given in Corollary 4 appears as a dotted line. Both axes 
are log-transformed and in all cases the finite dimensional optimal scalings are seen to 
approach their asymptotic values as d increases. For the Gaussian target very close agree- 
ment is attained even in one dimension since the asymptotic Gaussian approximation to 
the marginal radial distribution function is exact for all finite d. 



2e(-/i) = /ie'(-/i). 



(19) 



satisfies 



\ d = 2p 
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Figure 4. Plots of log A against log d for target examples 1-4. Optimal values predicted using 
Corollary 4 appear as dotted lines. 



Let us now explore the AOA, if it exists, and define aoo(/x) := lim^oo o^CaO- From 
Theorem 2 and Corollary 3(i) 



a 00 (Ai) = 2e(- ( u) = 2E i 



where R is the marginal radius of the limit of the sequence of scaled targets. We build 
upon Theorem 3, which explicitly requires that the rescaled marginal radial distribution 
should have no point mass at 0. Following the discussion in Section 3.2 the condition 
that there be an optimal /i implies that the limiting marginal radius has no point mass 
at infinity. The following is proved in Appendix A. 4. 

Theorem 4. Let be a sequence of d- dimensional spherically symmetric unimodal 
target distributions and let Y^ d ' be a sequence of jump proposal distributions. Let there 

exist kx and k y d ^ such that \Y^\/ky d ^ T —>- 1 and \%S d ^\/k^ —> R for some R with no 
point mass at 0. // there is a limiting (non-zero) AOA it is 

SooW <« P « 0.234. 

Equality is achieved if and only if there exist k x such that IX^I/fci^ 1. 
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For "standard" proposals, the often used optimal EAR of 0.234 therefore provides 
an upper bound on the possible optimal EARs for spherically symmetric targets, and 
it is achieved if and only if the mass of the target converges (after rcscaling) to an 
infinitcsimally thin shell. 



4. Elliptically symmetric distributions 

As discussed in Section 1.2 a unimodal elliptically symmetric target X may be defined in 
terms of an associated orthogonal linear map T such that X* := T(X) is spherically sym- 
metric with unit scale parameter. Since T is linear, the jump proposal in the transformed 
space is Y* :=T(Y). 

The ESJD (4) is preserved under the transformation, since Y? / 0f = Y£, and thus we 
simply apply Theorem 1 in the transformed space. Write F^ d (-) for the one-dimensional 
marginal density of spherically symmetric X» , We wish to optimise the ESJD 

S 2 (A):=2A 2 E[|Y*| 2 f* |d (-±A|Y*|)]. (20) 

Here expectation is with respect to Lebesgue measure r*(-) of Y*. Acceptance in the 
original space is equivalent to acceptance in the transformed space and the EAR is 
therefore given by 

a d (A) = 2E[F* |d (-±A|Y»|)]. (21) 

Corollaries 1 and 2 are now seen to hold for all unimodal elliptically symmetric targets. 
For Corollary 3(i) to be applicable in the transformed space we require there to exist 
kx and ky such that 

l TW Pf>l^ and l^y^L (22) 

rt/x ivy 

Since xl^ is spherically symmetric, it is natural to request convergence of X^ /kx 
explicitly in the statement of the theorem. Bedard [2] considers a situation analogous 
to this, but with R=l. The working statistician is free to choose a jump proposal such 
that \Y^\/k { y d) n ^l. If Y« is in fact spherically symmetric this convergence carries 
through to the transformed space provided the eccentricity of the original target is not 
"too severe". A proof of the following appears in Appendix A. 5. 

Theorem 5. Let {X^} be a sequence of elliptically symmetric unimodal targets and 
|T( d )} be a sequence of linear maps such that X^ := T^-^X^)) is spherically symmetric 
with unit scale parameter. Let {Yw} be a sequence of spherically symmetric proposals 
and let there exist {kx } and {k y } such that 
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Denote by Vi the eigenvalues of T^ d \ and define ky^ = (v 2 ) 1 / 2 ^^, where v 2 

^max(rf) 2 „ 



then for fixed 



the EAR and the ESJD satisfy 



I 1 '■= o 



1 d}l 2 k* y (d) , 



2 t.< d ) 



a d {n) -> 2E 



4A- 



*(d) 



R 



-I 



(23) 

(24) 

(25) 
(26) 



where $(z) is the cumulative distribution function of a standard Gaussian. If in fact 
R = 1 then 



4k* x 



(d) 



ad(/x)^2$(- M ), 



^S 2 ( f i)^2^(- t i). 



(27) 
(28) 



Naturally (28) leads to the same optimal /t p as for a spherically symmetric target, so 
the AOA is still approximately 0.234 and the AOS satisfies 



Ad = 2f2 p - 



Similarly (25) and (26) lead again to a(jl) < a(fj, p ) ~ 0.234. 



5. Discussion 

We have investigated optimal scaling of the random walk Metropolis algorithm on uni- 
modal clliptically symmetric targets. An approach through finite dimensions using ex- 
pected square jumping distance (ESJD) as a measure of efficiency both agrees with and 
extends the existing literature, which is based upon diffusion limits. 

We obtained exact analytical expressions for the expected acceptance rate (EAR) and 
the ESJD in finite dimension d. For any RWM algorithm on a spherically symmetric 
unimodal target it was shown that EAR decreases monotonically from 1 to as the 
proposal scaling parameter increases from to oo. This bijective mapping justifies to an 
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extent the use of acceptance rate as a proxy for the scale parameter. The theory for finite 
dimensional targets was then shown to extend to elliptically symmetric targets. 

An asymptotic theory was developed for the behaviour of the RWM algorithm as 
dimension d ^ oo. It was shown that the asymptotically optimal EAR of 0.234 extends 
to the class of spherically symmetric unimodal targets if and only if the mass of the targets 
converges to a spherical shell that becomes infinitely thin relative to its radius, with a 
similar but slightly stronger condition on the proposal. The optimal acceptance rate was 
then explored for target sequences for which the "shell" condition fails. In such cases the 
asymptotically optimal EAR (if it exists) was shown to be strictly less than 0.234. An 
asymptotic form for the optimal scale parameter showed that the dimension dependent 
rescalings which stabilise the radial mass for both the proposal and target must be taken 
into account. Much of the existing literature (see Roberts et al. [9]) uses independent 
and identically distributed (i.i.d.) target components and i.i.d. proposal components so 
that these two extra effects cancel and oc d~ x / 2 . 

The class for which the limit results are valid was then extended to include all algo- 
rithms on elliptically symmetric targets such that the same "shell" conditions are satisfied 
once the target has been transformed to spherical symmetry by an orthogonal linear map. 
If the original target is explored by a spherically symmetric proposal then an additional 
constraint applies to the eigenvalues of the linear map, which forbids the scale parameter 
of the smallest principle component from being "too much smaller" than all the other 
scale parameters and is equivalent to the condition of Bedard [2] , derived for targets with 
independent components that are identical up to a scaling. 

The optimality limit results are not always valid for targets with at least two very 
different (but important) scales of variation; however, the suitability of the RWM to 
such targets is itself questionable. 

Explicit forms for EAR and ESJD in terms of marginal radial densities were also used 
to explore specific combinations of target and proposal in finite dimensions. Numerical 
and analytical results agreed with our limit theory and with a simulation study in Roberts 
and Rosenthal [10]. 

Appendix 

A.l. Proof of Theorem 1 

The proof of Theorem 1 relies on a partitioning of the space of possible values for x* 
(and so for x') given x into four disjoint regions: 

• the identity region: i?id(x) := {x}, 

. the equality region: i? cq (x) := {x' G W l : x' f i? id (x), = ^ 

• the acceptance region: i? a (x) := {x' <E : a(x, x') = l,x' ^ i? eq (x) U i?id(x)}, 

• the rejection region: i? r (x) := {x' G 5R d :a(x, x') < 1}. 

Here ir(-) is the target (Lebesgue) density. For vectors (x, x') in 5R d x 5ft d we employ the 
shorthand := {(x, x') : x G 5R d ,x' G i?id(x)}, with regions Req,Ra an d Rr defined 
analogously. 
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The following lemma holds for almost any Metropolis-Hastings algorithm and allows 
us to simplify the calculations of ESJD and EAR. It is convenient to be able to refer to 
the proposed jump, Y* := X* — X. 

Lemma 3. Consider any Metropoplis-Hastings Markov chain with stationary Lebesgue 
density 7r(-). At stationarity let X denote the current element, X* the proposed next 
element and X' the realised next element. Let proposals be drawn from Lebesgue density 
g(x*|x) and assume that 

[ dx'g(x'|x)=0 Vx. (29) 

Also denote the probability of accepting proposal x* by a(x, x*) and the joint laws of 
(X,X*) and (X,X') ; respectively, by 

A*(dx,dx*) :=7r(x)dx<?(x*|x)dx* (30) 

and 

A(dx,dx') := A*(dx,dx')a(x,x')l{ x ,^ x } 

+ 7r(x)dx J dx*g(x*|x)(l -a(x,x*))l {x , =x} . (31) 

Finally let h(x,x') be any function satisfying the following two conditions: 

h(x,x') =cx h(x',x) Vx,x' (with c=±l), (32) 
h{x,x)=0 Vx. (33) 

Subject to the above conditions: 

1. E[ft(X,X')] = (1 + c) x / (xx , )ei?4 dxdx' 7 r(x)g(x'|x)/ l (x 1 x'). 

2. E[a(X ! X*)]=2/ (xx , )e ^dxdx* 7 r(x) g (x*|x). 

Proof. First note that an exchangeability between the regions R a (') and R r (-) follows 
directly from their definitions 

x'eR a (x) ^ xeRr(x'). 

Consecutively applying this exchangeability, reversibility and the symmetry of h(-, •), we 
find: 



/ A(dx,dx')h(x,x') = I A(dx,dx')h(x,x') 

= [ A(dx' ,dx)h(x,x') 
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/ A(dx',dx)/i(x',x) 



'(x'.x)6_R H 

The set Rid corresponds to the second term in j4(dx,dx'); this is in general not null 
with respect to A(-, •); however, (33) implies that h(pt, x') = in Rid- Further, a(x, x*) = 
1 V(x,x*) e i? E Q(x,x*), and (29) holds. Therefore 



f A(dx,dx>(x,x') =0. 



'(x,x')6-RroUfl E Q 

Thus _Rid and Req contribute nothing to the overall expectation of h(X-, X'). Since 
a(x,x*) = 1 V(x,x*) G i? J 4(x,x*) the first result then follows. 

The proof of the second result is similar to that of the first and is omitted. □ 

Sherlock [11] shows that for a symmetric proposal (such as the RWM) Lemma 3 may 
be extended to deal with cases where the density contains a series of plateaux and hence 
i?EQ is not null. Req is then partitioned into a null set and pseudo-acceptance and 
rejection regions that are exactly as would be found if each plateau in fact had a small 
downward slope away from the origin. 

In the region Ra, where acceptance is guaranteed, we have x' = x* and y = y* so that 
for integrals over Ra we need not distinguish between proposed and accepted values. 
Applying Lemma 3 with /i(x,x') = ||x' — x||| = |y| 2 , we have 

5d(A) = i/ dxdy^(x)r(y/A), (34) 
A JRa 

g dW = JdJ R dxdy|y|Mx)r(y/A). (35) 

First consider target densities that decrease with strict monotonicity from the mode. In 
this case Ra corresponds to the region where 

^(x + y) > tt(x) ^ |x + y| 2 <|x| 2 ^ x-y<-i|y|, (36) 

where y is the unit vector in the direction of y. So 

(x,x + y)ei? A ye$t d and x-y<-||y|. 

Thus (34) and (35) become 



5d(A) = |j ^ dyr(y/A)Fl|d ("5 ly 



s 2 d (x) = I 



3 |^y|y|My/A)F lM (-i|yf), 



and the required results follow directly. 

Without strict monotonicity Ra must simply be extended to include the "pseudo- 
acceptance regions" defined in Sherlock [11]. 
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A.2. Proof of Theorem 3 

Theorem 3 follows almost directly from the following simple lemma, the proof of which 
follows from a standard measure theory argument that we omit. 

Lemma 4. Let U d be a sequence of random variables and let Gd{-) — * G(-) be a sequence 
of monotonic functions with < Gd{u) < 1 and G(-) continuous. Then 

U d ^U E[Gd(U d )}^E[G(U)}, and 

Ud^U E[U^Gd(U d )]^E[U 2 G(U)}. 



Now note that -iA d |Y( rf )| = -/i|Y< d ) \/k { y d) x k {d) jd 1 ' 2 whence (5) and (6) become 



a d (n) = 2E 



[yWj kj d) 



8n 2 k ( x dy 



E 



|Y( d ) 



and 



(<>) 



|Y< rf )| fcj' 
ki d) 



Again denote the limiting one-dimensional distribution function corresponding to R by 

9(x). In Lemma 4 substitute U d = |Y< d )|/fc^ rf) 5 U = Y, G d {u) = F lld (-fj,^u) and 
G(u) = Q(—fiu). Note also that since G(-) and Gd{-) are bounded the convergence in the 

first part of the lemma holds if U d — ■* U. The theorem then follows directly. 



A. 3. Proof of Lemma 2 

(i) Pick an arbitrarily small 5 > and set a* = max(a, fX + 5). 

Define R:=(fJ,-8,(l + 5) and T := [0, a*]\R. 

Let m := max^g-r S 2 (n). Since fi, £ R uniquely maximises S 2 (-) in [0, a*], and 
since T is compact, a strict inequality holds: m < S 2 (fi). 

Also S d (fi) — > S 2 (fi) uniformly on compact [0, a*] and hence 3dx such that 

\S 2 (n)-Sl(n)\<^(S 2 (fi)-m) Va*g[0,o*] and d>d x . 

Hence for any /if, £ T and d> d\ 

S 2 M < S 2 (^ b ) + i(5 2 (A) - m) < i(5 2 (A) + m) 

= 5 2 ( / i)-i(5 2 (A)-m)<5 d 2 (A). 

Since fid is confined to [0,a*] it must therefore reside in R. 

(ii) Pick an arbitrarily small 6 > and set a* = fi + 26. Proceed exactly as in the 
proof to (i). The interval (ft — 5, ft + 6) contains a local maximum of S% since 
Sj(nb) < Sj(fi) for d> di and /j, b 6 [0,a*]\(/i - S,fi + S). 
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(iii) Pick a large k > and let m := max^gmw S 2 (fi). Now m < oo since S 2 (fi) is 
continuous on compact [0, k] . As there is no finite arg max, 3/Xf, > k such that 
S 2 (fib) = m + S for some 5 > 0. 

Convergence of S^ix) to <S 2 (/x) is uniform on [0, /Xb] and therefore 3di such that 

|S 2 (/x) - S 2 (/x)| < 6 - V/xG[0,/x&] andd>di. 
Hence for /x a G [0, fc] 

Sg(Ma) < S 2 {^ a ) + S - = S 2 ^) - 5 -< S 2 (fl b ). 

Thus for any k and any fi a G [0, fc], for all large enough d there is always an 
/ib > k such that Sj(nb) > ^(Ma) an d hence the maximum is achieved at [id > k. 
Therefore fid — ► oo. 

(iv) Define a* = max(a,/x) and choose an e > 0. Since S^(fx) — > S 2 (fi) uniformly on 
compact [0,o*], 3di such that Vd > di and /x G [0,o*], |<5 2 (jLt) - S 2 (/x)| < e. By 
definition S 2 (fid) > S 2 (fi), therefore 

S 2 (£)-£< Si (fi)<S 2 M<S 2 (p, d ) + e<S 2 {fi) + e. 



A. 4. Proof of Theorem 4 

Observe that 

e'(-/i) = e 

so (19) becomes 



[i* 


( xxY 




\ RJ. 



2E 


k-s): 


= E 




f /A" 













For a given distribution of R, this has solution jj,, from which the AO A is 



a := aoo(A) = 2E 



(37) 



Substitute := $(—4) so that for /x > and i? > we have v G [0,0.5]. Also define 



The AOA is therefore 



h(v) := -$" 1 (w)0($- 1 (w)). 
a = 2E[V] 



and (37) is satisfied, becoming 



2E[V]=E[h(V)]. 



(38) 



798 



C. Sherlock and G. Roberts 



But 

d 2 h <S> _1 (V) 

TT = 2 -^ y,, <0 forue [0,0.5], 

with strict inequality for v £ (0,0.5) (i.e., r € (0, oo)). Therefore by Jensen's inequality 

E[h(V)]<h(E[V]). (39) 

Since /i"(-) is strictly negative except at the (finite) end points, equality is achieved if and 
only if all the mass in V is concentrated in one place, vq] this corresponds to all the mass 
in R being concentrated at — /i/$~ 1 (uo) and is exactly the situation \X.W\/kx -^-> 1. 
Substitute m := $ _1 (E[y]), so that (38) and (39) combine to give 

2<f>( — m) < m(f)(m). 

When there is equality the single solution to this equation is m = fj, p ks 1.19. The inequal- 
ity is strict if and only if m > fi p , and hence 2$(— m) < 2<£>(— fi p ). 
Therefore the AOA is 

a = E[V] = 2$(-m) < 2<f>(-fi p ) « 0.234, 
with equality achieved if and only if \X.^\/k^ -^-> 1. 



A. 5. Proof of Theorem 5 

Denote the arithmetic mean of the squares of the d-dependent scalar values ot\ (d) , . . . ,o.d(d) 
by a 2 (d), and the maximum by a max (rf). The following is proved in Sherlock [11]. 

Lemma 5. Let S^ d ' be a sequence of orthogonal linear maps on 5R d with eigenvalues 
cti(d), . . . , ctd{d) and let U^' be a sequence of isotropic random variables in 3? d . Then 

{d) n h s. 1 |SW(UW)| m . S;l 

(a 2 (d))V2 



provided the eigenvalues of satisfy 

^max(^) 



0. (40) 



It should be emphasised that condition (40) applies to the map that transforms a 
spherically symmetric random variable to an elliptically symmetric random variable. 

Now apply Lemma 5 with U (d) = Y( rf ) /k { y d) and S (d) = T( rf ) to see that the second half 
of (22) holds with the new rescaling factor ky . We may therefore apply Corollary 3 in 
the transformed space with fi as defined in (24). Since EAR and ESJD are invariant to 
the transformation this leads directly to (25)-(28). 
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